An Algorithm for Translating Chemical Names to Molecular Formulas, (Thesis)
نویسنده
چکیده
An algorithm for translating directly from chemical names to molecular formulas is described. The validity of the algorithm was tested both manually and by computer. Molecular formulas of several hundred randomly selected chemicals were calculated successfully, verifying the linguistic analyses and the logic of the computer program. The algorithm for manual human translation consists of eight simple operations. The procedure enables non-chemists to compute molecular formulas quickly without drawing structural diagrams. Th e machine translation routine is rapid and requires a program of less than 1000 instructions. If the experimental dictionary were expanded to include low frequency morphemes, formulas for all chemical names could be handled. The problem ofchemical nomenclature is discussed in terms of the information requirements of chemists, The approach of the linguist to the problem of nomenclature is contrasted with that of the chemist, It is shown that there is only one language of chemical nomenclature though there exist many systems of nomenclature, The difficulties in syntactically analyzing Chemical Abstracts (C. A.) nomenclature results from C. A/s ambiguous use of morphemes such as imino, not the use of so-Cal led trivial nomenclature. The more systematic 1.U. P. A.C. nomenclature includes idiomatic expressions but eliminates all homonymous expressions. of the most frequently occurring segments. Approximately forty morphemes such as 10, e, y! and allomorphs such as thi and sulf were isolated, A list of their 200 actual co-occurrences were compiled. These studies are particularly valuable in identifying idiomatic expressions such as diaz, the meaning of which cannot be computed from the referential meanings of di and az. Morpheme classes are illustrated by the bonding morphemes (an, en, yn, iun, etc.) and the homologous alkyl morphemes meth, eth, prop, but, etc. The syntactic analyses include the demonstration of transformational properties in chemical nomenclature as e.g. in primary amines (R-N) where aminoRune= Rylamine. To complete the grammar onewould have to expand the inventory of morphemes, morpheme classes, and the list of transformations, Chemical name recognition is not simply a word-for-word translation procedure a Rather the syntactic analysis required is comparable to the procedure employed by Harris, IIiz, et al (Transformations and Discourse Analysis Projects, Univ. of Pennsylvania) for normal English discourse. The structural linguistic data is supported by a summary of 1. U. P. A.C. rules for generating chemical names. In order to relate this study to the general problem of chemical information retrieval, the historical development of chemical nomenclature is traced from the 1892 Geneva Conference to the present. The relationship between nomenclature, notation, indexing and searching (retrieval ) systems is discus sed, In particular, the need for linguistic studies to solve the intellectual facet of the “retrieval” problem is discussed in contrast with the manipulative aspects which are more readily amenable to machine handling. The problem of synonymy in chemical nomenclature must be resolved if computable syntactic analyses of chemical texts are to be used for mechanized indexing. Th e completion of the detailed grammar of chemical nomenclature would not only permit the calculation of molecular formulas but also the generation of structural diagrams, systematic names, line notations, and other information required in machine searching systems, With suitable modifications the procedures could easily be applied to foreign nomenclature. The field of chemico-linguistics is of interest to the organic chemist as it can improve methods for teaching nomenclature. Similarly, for the linguist chemical nomenclature is a fertile field of study, One can control the experimental conditions more easily than in normal discourse, However, conclusions can be drawn which may have more general application.
منابع مشابه
A PARTICLE SWARM OPTIMIZATION ALGORITHM TO SUGGEST FORMULAS FOR THE BEHAVIOUR OF THE RECYCLED MATERIAL REINFORCED CONCRETE BEAMS
Reducing waste material plays an essential role for engineers in the current world. Nowadays, recycled materials are going to be used in order to manufacture concrete beams. Previous studies concluded that the currently proposed formulas to predict the flexural and shear behavior of the reinforced concrete beams were not appropriate for those manufactured by recycled materials. This study aims ...
متن کاملFormalizing Symbolic Decision Procedures for Regular Languages
This thesis studies decision procedures for the equivalence of regular languages represented symbolically as regular expressions or logical formulas. Traditional decision procedures in this context rush to dispose of the concise symbolic representation by translating it into finite automata, which then are efficiently minimized and checked for structural equality. We develop procedures that avo...
متن کاملDBCHEM: A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task
We propose a method, named DBCHEM, based on database queries for the chemical compound and drug name recognition task of the BioCreative IV challenge. We prepared a database with 145 million entries containing compound and drug names, their synonyms, and molecular formulas. PubChem Power User Gateway (PUG) system is used to construct the database. Candidate chemical and drug names are identifie...
متن کاملQSAR models to predict physico-chemical Properties of some barbiturate derivatives using molecular descriptors and genetic algorithm- multiple linear regressions
In this study the relationship between choosing appropriate descriptors by genetic algorithm to the Polarizability (POL), Molar Refractivity (MR) and Octanol/water Partition Coefficient (LogP) of barbiturates is studied. The chemical structures of the molecules were optimized using ab initio 6-31G basis set method and Polak-Ribiere algorithm with conjugated gradient within HyperChem 8.0 environ...
متن کاملFrame Labeling of Competing Narratives in Journalistic Translation
Studying translations during the time of conflict has gained currency in the recent decade in translation studies. One of the cases in which conflict manifests itself is in the way different countries choose to name an event or a geographical location, for example. This study set out to understand how translation of rival names and labeling was carried out in Iranian state-run news agencies. To...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999